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Abstract — We consider a linear Gaussian noise channel used 
with delayed feedback. The channel noise is assumed to be 
a ARMA (autoregressive and/or moving average) process. We 
reformulate the Gaussian noise channel into an intersymbol 
interference channel with white noise, and show that the 
delayed-feedback of the original channel is equivalent to the 
instantaneous-feedback of the derived channel. By generalizing 
results previously developed for Gaussian channels with instan- 
taneous feedback and applying them to the derived intersymbol 
interference channel, we show that conditioned on the delayed 
feedback, a conditional Gauss-Markov source achieves the feed- 
back capacity and its Markov memory length is determined by 
the noise spectral order and the feedback delay. A Kalman-Bucy 
filter is shown to be optimal for processing the feedback. The 
maximal information rate for stationary sources is derived in 
terms of channel input power constraint and the steady state 
solution of the Riccati equation of the Kalman-Bucy filter used 
in the feedback loop. 

I. Introduction 

For Gaussian noise channels used with feedback, the chan- 
nel capacity has been characterized in various aspects. For 
memoryless channels, Shannon [1] showed that feedback does 
not increase the capacity, and Schalkwijk and Kalaith [2] 
proposed a capacity achieving feedback code. For channels 
with memory, bounds have been developed for the feedback 
capacity [3], [4], [5], [6], [7]. In [8], the optimal feedback 
source distribution is derived in terms of a state-space channel 
representation and Kalman filtering. The maximal information 
rate for stationary sources is derived in an analytically explicit 
form in [9]. For first order moving-average (MA) Gaussian 
noise channels, the feedback capacity is achieved by stationary 
sources as shown in [10]. 

Here we consider a Gaussian noise channel used with de- 
layed feedback under an average-input-power constraint. Com- 
pared to the instantaneous feedback case, fewer results have 
been obtained on channels with delayed feedback. Yanagi [11] 
derived an upper bound on the finite block length delayed 
feedback capacity. In [12], it was shown that delayed feedback 
capacity for finite-state machine channels can be determined 
based on a method developed for instantaneous feedback by 
augmenting the channel state to account for feedback delay. 

We first re-formulate the Gaussian noise channel with 
delayed feedback into an equivalent state-space channel model 
with instantaneous feedback and white noise. The delayed- 
feedback information rate of the original Gaussian noise chan- 
nel equals the instantaneous-feedback information rate of the 



derived state-space channel. By generalizing the methodology 
and results derived in [9], [8], we show that 

1) a feedback-dependent Gauss-Markov source is optimal 
for achieving the delayed-feedback capacity, and the 
necessary Markov memory length equals the larger of 

a) the moving average (MA) noise spectral order, and 

b) the sum of the feedback delay and the autoregres- 
sive (AR) noise spectral order; 

2) a state estimator (Kalman-Bucy filter) for the derived 
state-space channel model is optimal for processing 
the (delayed) feedback information, and the solution of 
its steady-state Riccati equation delivers the maximal 
information rate for stationary sources. 

Notation: Random variables are denoted by upper-case 
letters, e.g., X t , and their realizations are denoted using lower 
case letters, e.g., x t . A sequence n, • • ■ , %j is shortly 
denoted by x\. The letter E stands for the expectation. The 
differential entropy of a random variable X is denoted by 
h(X). Bold uppercase letters stand for matrices (e.g., K), 
while underlined letters stand for column vectors (e.g., c). 

II. Channel Model Reformulation 

Let X t be channel input at time t. Let R t be channel output 
at time t. We start by considering a Gaussian noise channel 



Rt = X t + N t . 



(1) 



The noise Nt is assumed to be an autoregressive moving 
average (ARMA) random Gaussian process with a rational 
power spectrum 
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The coefficients a m and Cfc are the spectral poles and zeros, 
and M and K indicate the orders of the the moving-average 
(MA) and autoregressive (AR) noise power spectral compo- 
nents, respectively. Since the poles and zeros of (O appear in 
pairs symmetric with respect to the unit circle [13], without 
loss of generality, we may assume that \a m \ < 1 and \ck\ < 1. 
Hence, the filter defined by 



H(z) = 1 - 




(3) 



and its inverse are both causal, stable and invertible. 

We make the following assumptions on the channel usage: 

1) The power of the channel input process is constrainecQ 
l^ooEEL^ 2 ] /n = P. 

2) Let v > 1 be the feedback delay. The prior channel 
outputs R_<£ are known to the transmitter (via the 
feedback loop) before the transmission of X t . 

3) Transmission starts at time t = 1, i.e., X t = for 
t < 0. Thus, noise history NZ^ is known to both the 
transmitter and receiver. 

Since the filter H (z) is invertible, we may apply to 
the channel output R t without changing the channel capacity. 
The equivalent intersymbol interference (ISI) channel has X t 
as the channel input, Ut as the channel output, and white 
Gaussian noise V t (with power c^). 

U{z) = H~Xz) (X(z) + N(z)) = H-\z)X(z) + V(z). (4) 

The original channel outputs i?^^ can be determined 
from J/ioo using filter H{z). 

To simplify notation for deriving the delayed information 
rate, we change variables Y t = Ut-v and Wt = Vt-v, and 
further reformulate the ISI channel in terms of X t and Y t as 



: (noiseless feedback) 



Y{z) = z- u U(z) = z- v H-\z)X{z) + W(z). 



(5) 



The ISI channel (O with input X t and output Y t = Ut- V is 
depicted in Fig Q] The channel is completely characterized by 
the tap coefficients a m , Ck and v. Without of loss of generality, 
we can assumed M = K + v, and denote 
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0, • • ■ ) , l,Ci,C2, • • • ,c K 
' — 1 zeros 

The channel depicted in Figure Q] has a state-space represen- 
tation. Let the vector of values stored in the channel memory, 
i.e., S t = [St{l),St{2),...,St{M)} T , be the channel state 
vector. The state space channel equations are 
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where Wt is white Gaussian noise with variance erj^. The 
constant square matrix A and vector b are defined as 
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... 1 
From channel assumptions l)-3), we have the following: 
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Fig. 1. Equivalent state space model for Gaussian channels with delayed 
feedback. 



I) Since X t = for t < 0, the initial channel state s = 

is known to both the transmitter and the receiver. 
II) The sequences S\ and X\ determine each other uniquely 
according to equation (0. 

III) Given the channel state S_ t _ 1 = S_ t _ 1 , the channel output 
Yt is statistically independent of channel states Sl ~ 2 ,S_ t 
and outputs Y*~ , that is 

^tlfiS,*?- 1 ( yt = P Y*%-i ( yt ' (10) 

Since the variance of the process Wt is o^, the condi- 
tional differential entropy of the channel output equals 

h(Y t \Sl,Yt 1 )=h(Y t \S i _ 1 ) = ilog(27recT^). (11) 

IV) The instantaneous feedback of Y t in the above derived 
channel is equivalent to the delayed feedback Rt- V in 
the original channel. Thus, we only need to consider the 
following encoder X t = X (M^Y^ 1 ), where M is 
the message to transmit. For the source distribution, the 
channel input X t is causally dependent on all previous 
channel states S^T 1 and channel outputs Y,* -1 



or equivalently in terms of the channel states as 



We only need to consider Gaussian sources [4]. 



'Since it has been shown [14] that the feedback capacity is a concave 
function of P, it is not necessary to consider the inequality constraint 



lim^oo E [ELi *t\ In < P- 

2 If M < K + v, we let a M+1 = 0, • • • , a K+v 
to be K + v. If M > K + v, we let c K +v+i = 
then redefine K to be M — v. 



- and then redefine M 
0, ■ ■ ■ , cm = and 



III. Information Rate and Optimal Sources 

We note that in the derived state-space channel model, the 
first channel output that carries non-zero signal is Y\+ y =U\. 



The information rate equals 
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In the following analysis, we note that since the initial 
channel state s Q is known according to the channel assumption 
in Section HU for notational simplicity, we will not explicitly 
write the dependence on s when obvious. 

We consider all feedback-dependent Gaussian sources de- 
fined in O or ([T3ll 



V = {P t {s t \st 1 ,yt 1 ),t=l,2,...}, (20) 
and the channel input is subject to the input power constraint 
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The following two theorems can be conveniently general- 
ized from [8] where they were originally derived for Gaussian 
channels used with instantaneous feedback. 

Theorem 1 (Gauss-Markov Source are Optimal): For the 
power-constrained linear Gaussian channel, a feedback- 
dependent Gauss-Markov source 



V GM ^{P t (s t \s t _ 1 ,y^),t = l,2,...} 



(22) 



achieves the delayed-feedback channel capacity, (proof in 
Appendix) □ 
By Theorem Q] without loss of optimality, in the sequel we 
only consider feedback-dependent Gauss-Markov sources as 
in (1221 . 

Definition 1: We use a t (-) as shorthand notation for the 
posterior pdf of the channel state S_ t , that is 



which is Gaussian due to Gaussian channel inputs. 

For a feedback-dependent Gauss-Markov source 7? GM 5 me 
functions a t (-) can be recursively computed as 



(23) 



□ 



Ja t -i (v)P t (ji\v,y\ 1 )P Yt |s t _ i <s Jy t |«,ft )dv 
JJot_ 1 (o)P t (u|« 3 yi- 1 )P 1 
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(24) 



The Gaussian function at(-) is completely characterized by the 
conditional mean m t (vector of dimention M) and conditional 



covariance matrix K t (of dimension M by M) 

§-t |s.0' Hi ] ) 

(S t -m) (s t -m.t) T Uo.y* 



m = e 



K t = E 



(25) 
(26) 



We note that the recursion (l24l can be implemented by a 
Kalman-Bucy filter. 

Theorem 2: For the power-constrained linear Gaussian 
channel, the delayed-feedback capacity is achieved by a 
feedback-dependent Gauss-Markov source V G defined as 



P° M = {^(atk-i,a t _ 1 (-)),* = l,2,...} 



(27) 



where the Markov transition probability depends only on 
the posterior distribution function of the derived channel 
state a t (-) instead of all prior channel outputs, (proof in 
Appendix) □ 

Theorem [2] suggests that, for the task of constructing the 
next signal to be transmitted, all the "knowledge" contained in 
the vector of prior channel outputs is captured by the posterior 
distribution ait-i(') of the channel state. 

By Theorem Q] and Theorem [2] we only need to consider 
a feedback-dependent Gauss-Markov source V„ M as defined 
in d27). 

IV. Feedback Capacity Computation 

The delayed feedback capacity thus can be derived in a 
similar way as in [8], though the results slightly differ due to 
feedback delay. 

A. Source Parameterization 

Without loss of generality, a feedback-dependent Gauss- 
Markov source V GM can be expressed as 



x t = d t S_ t _ x + e t Z t + g t , 



(28) 



where Z t is a Gaussian random variable with zero-mean and 
unit-variance and is independent of X\~ l and V/ , 

and vector d t is of length M. The coefficients d t , ej and g t 
are all dependent on the Gaussian pdf ctt-i(-), or alternatively 
on its mean m t _ 1 and covariance matrix Kt— i. The set 
of coefficients {d t ,et,gt} completely determine the transi- 
tion probabilities of the feedback-dependent Gauss-Markov 
source V GM defined in d27b . 

Lemma 1: For the feedback-dependent Gauss-Markov 
source as parameterized in d28l . we have 

h(Y t | l0)y *-i)-ilog(27re^)=ilog(l+^^) , (29) 



•w 



and 



E [(X t ) 2 \so, Vi^hifmt-i+g) + £K t -idt + (etf , (30) 
where the values of d t , et, gt depend on ra t _i and K t _i. □ 



Proof: The first and second order moments of the channel 
input X t and output Y t can be computed as 

E [{X t f ^(d^m t _ 1+ g t y+djK t ^d t + (e t )l3l) 

yt\s ,y\- l ]=c T m t _ 1 (32) 
'(^-E^l^y*- 1 ]) 2 ^,^- 1 



where the maximization in (|38ll is taken under constraints 



c T K t _!C+o-^. (33) 



Conditioned on s and y{ , the variable Y t has a Gaussian 
distribution with variance Q3t . thus we obtain $291 . ■ 
Lemma 2: The parameters of the optimal feedback- 
dependent Gauss-Markov source must satisfy 

9t = -djm^. (34) 
Proof: By Lemma Q] and equation d!71 l. the value of g t 
does not affect the information rate, but choosing g t as in (f34l 
minimizes the average input power for given d t and et- 

We note that this essentially follows the center of gravity 
necessary condition for optimal sources as derived in [15]. ■ 

B. Feedback Capacity for Stationary Sources 

Definition 2 (Stationary sources): A stationary feedback- 
dependent (Gauss-Markov) source is a source that induces sta- 
tionary channel input and output processes. An asymptotically 
stationary feedback-dependent (Gauss-Markov) source, in its 
limit as t — > oo, induces stationary channel input and output 
processes. □ 

Lemma 3: For a stationary (or asymptotically stationary) 
feedback-dependent Gauss-Markov source, the covariance ma- 
trix K t and source coefficients d t and e t converge, i.e., 



lim K t = K, 



lim d t 



d. 



lim et 



(35) 



Here, the matrix K satisfies the stationary Kalman-Bucy filter 
equation (the algebraic Riccati equation) 

QKcc T KQ T 



K = QKQ J 



b b T e 2 
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A 



where the matrix Q is defined as Q = A 4 
instantaneous channel input power converges as 

lim E[(X t ) 2 |s ,^- 1 ]^ T K t _ 1 d+(e) 2 



(36) 



b d. The 



(37) 



□ 

Proof: Since the (asymptotically) stationary source in- 
duces, in its limit as t — > oo, stationary channel input and 
output processes, by definition the Kalman-Bucy filter has a 
steady state, and thus the sequences K t , d t and e< converge. 
The Riccati equation d36l l is obtained as the stationary form 
of the covariance matrix of the Kalman-Bucy filter. The limit 
in 03 follows IM and $35$- ■ 

Theorem 3 (Feedback capacity for stationary sources): 
For a power constrained Gaussian channel used with i/-time 
delayed feedback, the maximal information rate for stationary 
sources equals 
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(40) 



A T 

The matrix Q is defined as Q = A + b d , and the matrix K 
is constrained to be non-negative definite. □ 
Proof: By Lemma [3] for any (asymptotically) stationary 
Gauss-Markov source, the sequences Kt, d t and et converge 
as t — * oo, so (l% and d29l turn into ( 1381 ) as n — > oo. 
Constraint d40b is the algebraic Riccati equation d36T ). Con- 
straint d39b is the input power of the stationary source, and 
subsequently utilizing Lemmas [2] and [3] ■ 
In general, the optimization problem in Theorem [3] in- 
volves 0(M 2 ) variables and can be conveniently solved 
analytically for small M or numerically for large M. 

V. Conclusion 

In this paper, we derived the delayed feedback capacity 
of power-constrained stationary sources over linear Gaussian 
channels with ARMA Gaussian noise. We first reformulated 
the linear Gaussian noise channel into a state-space form that 
is suitable for manipulating the delayed feedback information 
rate. Then, we obtained the delayed feedback capacity for 
stationary sources by generalizing and applying a method 
that was originally developed for computing the instantaneous 
feedback capacity. We showed that a feedback-dependent 
Gauss-Markov source achieves the delayed-feedback chan- 
nel capacity and that the Kalman-Bucy filter is optimal for 
processing the feedback. The delayed-feedback capacity is 
expressible as an optimization problem with constraints on 
the conditional state covariance matrix of the Kalman-Bucy 
filter. 

Appendix 

A. Sketch of Proof for Theorem UJ 

Let V\ be any valid feedback-dependent Gaussian source 
distribution (not necessarily Markov) defined as 



Vi = {P t {s t \st\yi- 1 ),t = l,2,---} 



(41) 



From V\, we construct a Markov (not necessarily stationary) 
source distribution V% as 



V 2 = {Qt (s t \s t _ l ,y\- 1 ) 1 t=l,2 1 ---) 



(42) 



where the functions Qt (s t \s t _ 1 ,y t 1 1 ) are defined as the 
conditional marginal pdf 's computed from V\ 



t-1 \£.t \£.t 



We next show by induction that the the sources V\ and Vi 
induce the same distribution of S^-i an d ^i*> i- e -' 



(38) P 
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l«o). (44) 



(47) 



(48) 



For t = 1, by the definition of source V2 we have 

p Z%\s () &>tt l^o) = Uo)^|5j (yi ko) (45) 

= Pi(s 1 |s )P yi | S (46) 
Since s is known, this directly implies 

Now, assume that the equality (l44t holds for up to time t — 1, 
where £ > 1, particularly, 

^Sk'^lsi- 1 * 1 ^ ^-^^S.V!*- 1 ^-*-^ ^ _1 |^o)(49) 

II Pr(l T laS-'.Wr 1 ) P Y r \Sl_SVr kr-l)^- 3 - (50) 
r=l 

The induction step for time t is simply shown as follows 



Pjplyt-Hs (^=a.l/* _1 l«o)d«*-2 (51) 



(a) 
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(54) 



T = l 



^Jt ; v *l<? (—1—1 ' 2/* l&>) ' 

where (a) is the result of expanding the definition in d43l for 
source 7^2 an d the induction assumption ( |50l > using the Bayes 
rule and substituting them into ( IBTl i, and (6) is obtained by 
simplifying the expression in d52] >. 

Thus, we have shown that the channel states S* t _ 1 and 
outputs F] 4 induced by sources V\ and V% have the same 
distribution. It is therefore clear that the non-Markov source 
V\ and Markov source Vi induce the same information rate 
according to equality (TT9b . 

B. Sketch of Proof for Theorem^ 

Suppose that two different feedback vectors y\ and 
(^l -1 7^ induce the same posterior channel state pdf 

af_i(-), i.e., for any possible state value s t _ 1 =/iwe have 

P s t _ 1 \s x 1 t - 1 (^ko^^ 1 )=Ps t _ 1 \s .Yr 1 (^o^y t f 1 ) ■ (55) 

Now consider two distributions for the source S T , for r > 
i, the first distribution conditioned on and the second 



conditioned on y\ . If we let these two distributions be equal 
to each other for r > t, that is, if 

{i , r(flrk-l.^" 1 ,»r 1 ).'->*} 

= {i'r(fl r |fi r -i,tfi- 1 ,»r 1 ).'->*}» (56) 

then we have for any k > t 



Dt >§t-i \§o> Vx , 



P\rk 



oet-l 



Ut~l)[[Pr(s T \s T _ 1 ,yl 1 )P y \srJyVr\fr-l) 



= Pyf.s^ls^Y*- 1 {yt,s.t-i\s ,yl A ) ■ (57) 
This shows that for any k > t the entropies are equal 



..t-i " 



h{Y t k \^y\- 1 )=h{Y t k \s„yl 1 ), 
and for any t > t the powers are equal 

E [(X T ) 



"Kit 1 ' 



E[(X T ) 2 Kyt 1 '. 



(58) 



(59) 



Therefore, the optimal source distribution for time r > t when 

is the feedback vector, must also be optimal when 
is the feedback vector, and vice versa. Since time t is arbitrary, 
we conclude that, for any t > 0, the function at— i(-) extracts 
from all that is necessary for formulating the optimal 
source distribution functions P t (s t |s t _ 1 ,y* _ ). 
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